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^ , Abstract 

The biggest cost of computing with large matrices in any modern computer is related to memory latency 
i and bandwidth. The average latency of modern RAM reads is 150 times greater than a clock step of the 

processor [1]. Throughput is a little better but still 25 times slower than the CPU can consume. The 
application of bitstring compression allows for larger matrices to be moved entirely to the cache memory 
of the computer, which has much better latency and bandwidth (average latency of LI cache is 3 to 4 
clock steps). This allows for massive performance gains as well as the ability to simulate much larger 
I models efficiently. In this work, we propose a methodology to compress matrices in such a way that they 

. retain their mathematical properties. Considerable compression of the data is also achieved in the process 

Thus allowing for the computation of much larger linear problems within the same memory constraints 
when compared with the traditional representation of matrices. 
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^ Author Summary 

O 

■ Introduction 

(N 

' Data compression is traditionally used to reduce storage resources usage and/or transmission costs [5]. 

CO , Compression techniques can be classified into lossy and lossless. Examples of lossy data compression are 

C3 ■ MP3 (audio), JPEG (image) and MPEG (video). In this paper we discuss the use of lossless compres- 

sion for numerical data structures such as numerical arrays to achieve compression without losing the 
mathematical properties of the original data. 

Lossless compression methods usually exploit redundancies present in the data in order to find a 
shorter form of describing the same information content. For example, a dictionary-based compression, 
5J] , only stores the positions in which a given word occurs in a document, thus saving the space required to 

store all its repetitions [3]. 

Any kind of compression incurs some computational cost. Such costs often have to be paid twice since 
the data needs to be decompressed to be used for its original purpose. Sometimes computational costs are 
irrelevant, but the need to decompress for usage, can signify that the space saved with compression must 
be available when data is decompressed for usage, thus partially negating the advantages of compression. 

Most if not all existing lossless compression methods were developed under the following usage 
paradigm: produce — > compress — > store — > uncompress — > use. The focus of the present work is to 
allow a slightly different usage: produce — s- compress perform mathematical manipulations and decom- 
press (only for human reading). 

With the growth of data volumes and analytical demands, creative solutions are needed to efficiently 
store as well as consume it on demand. This issue is present in many areas of application, ranging from 
business to science [4] , and is being called the Big Data phenomenon. In the world of Big Data, the need 
to analyze data immediately after its coming into existence became the norm. And this analysis must 
take place, efiiciently, within the confines of (RAM) memory. This kind of analyses are what is now known 
as streaming data analysis [5]. Given a sufficiently dense stream of data, compression an decompression 
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costs may become prohibitive. So having a way to compress data and keeping it compressed for the entire 
course of the analytical pipeline, is very desirable. 

This paper will focus solely on numerical data which for the purpose of the applications is organized as 
matrices. This is a most common data structure found in computational data analysis environments. The 
matrices compressed according to the methodologies proposed here should be able to undergo the same 
mathematical operations as the original uncompressed matrices, e.g. linear algebra manipulations. This 
way, the cost of compression is reduced to a single event of compression and no need of decompression 
except when displaying the results for human reading. The idea of operating with compressed arrays is 
relatively new [S], and it has yet to find mainstream applications to the field of numerical computations. 
One application which employs a form of compression is the sparse matrix linear algebra algorithms [7] , 
in this case there is no alteration in the standard encoding of the data, but only the non-zero elements 
of the matrices are stored and operated upon. 

Larger than RAM data structures can render traditional analytical algorithms impracticable. Cur- 
rently, the technique most commonly used when dealing with large matrices for numerical computations, 
is memory mapping [8l|9]. In memory mapping the matrix is allocated in a virtual contiguous address 
space which extends from memory into disk. Thus, larger than memory data structures can be manipu- 
lated as if they were in memory. This technique has a big performance penalty due to lower access speeds 
of disk when compared to RAM. 

In this paper we present two methods for the lossless compression of (numerical) arrays. The methods 
involve the encoding of the numbers as strings of bits of variable length. The methods resemble the 
arithmetic coding |10j algorithm, but is cheaper to compute. We describe the process of compression and 
decompression, and study their efficiency under different applications. We also discuss the efficiency of 
the compression as a function of the distribution of the elements of the matrix. 

Methods 

Matrix compression 

To maintain mathematical equivalence with the original data for any arithmetic operations, we need to 
maintain the structure of the matrix, i.e., the ability to acess any element given its row i and column 
j and also the numeric nature of its elements. In order to achieve compression we decided to exploit 
inefficiencies in the conventional way matrices are allocated in memory. The examples in this paper will 
be restricted to matrices with positive integer elements. 

The compression method is as follows. Let Af^xc be a matrix, in which r is the number of rows 
and c the number of columns. Each element of this matrix, called niij, is a positive integer. In digital 
computers, all information is stored as binary code (base 2 numbers). However the conventional way to 
store arrays of integers is on a memory block sequence of fixed size (power of 2 numbers of bit), one for 
each element. The maximum size of a block is equal to the word size of the processor, which for most 
current CPUs is 64 bits. Some special number such as complex number may be encoded as two blocks 
instead of one. The size of the chunk of memory allocated to each number will determine their maximum 
size (for integers) or their precision (for floating-point numbes). So for matrix M, the total memory 
allocated, assuming chunks of 64 bits, is given hy B ^ r x c x 64. 

The number of bits allocated B, is larger than the absolute minimum number of bits required to 
represent all the elements of M, since smaller integers, when converted to base 2, require less digits. 
From now on, when the numerical base will be explicitly notated when necessary to avoid confusion 
between binary and decimal integers. 

Let's consider an extreme example: a matrix composed exclusively of Os and Is (base 10). If the 
matrix type is set to 64-bit integers, 63 bits will be wasted per element of the matrix, since the minimum 
number of bits needed to store such a matrix isb = rxcxl. The potential economy of bits ^ can be 
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represented by^ = S— b = ?'xcx63. 

So it is evident that for any matrix whose greatest clement requires less than 64 bits (or the fixed 
type of the matrix) to be represented, potential memory savings will grow linearly with the size of the 
matrix. 



Method 1: The Supreme Minimum (SM) 

The SM method consists in determining the value of the greatest element of matrix M, which coincides 
with its supremum, maxM == supM and determine the minimum number of bits, 6(sup M) (Equation 
[T]), required to store it. We will use capital roman letters to denote uncompressed matrices and the 
corresponding lower case letter for the compressed version. 

KsupM).|:' ^ , ifsupMe{0,l} 

\Llog2(supA/)J + 1, if sup A/ >1 

The allocation of memory still happens in the usual way, i.e., in fixed size 64-bit chunks, only that 
now, in the space required for a single 64 bit integer, we can store for example, an entire 8x8 matrix of 
Oio and Iiq. 

Let's look at a concrete example: suppose that the greatest value to be stored in a matrix M is 
maxM = 1023. Therefore, the number of bits required to represent it is 10 (1111111111). Let the first 
8 elements of AI be: 



M 



900 1023 721 256 1 10 700 20 



(2) 



These elements of M, in binary, are shown in Table [T] It is evident that the number of bits required 
to represent any other element must be lower or equal to 10. From now on the minimum number of bits 
required to represent a base 10 integer will be refered to as its bit-length. 

Table 1. Some elements of M represented in binary base. 



Element 


Value 


Binary 


Bit length 


Mi.i 


900 


1110000100 


10 


Afi,2 


1023 


1111111111 


10 


Mi,3 


721 


1011010001 


10 


Ml, 4 


256 


100000000 


9 




1 


1 


1 


Ml.6 


10 


1010 


4 


Mij 


700 


1010111100 


10 


Ml, 8 


20 


10100 


5 



To store matrix AI it first has to be converted to base 2 (Af2). Then it will be unraveled by column 
(column major, e.g. in Fortran) or by row (row major, e.g. in C) and its elements will be written as fixed 
size adjacent chunks of memory. The size of each chunk is determined by the type associated with the 
matrix (typically 64 bits, but always a power of 2). 

According to the SM method, having determined that each clement will require at most 10 bits, we 
can divide the memory block corresponding to a single 64 bit integer into six 10-bit chunks which can 
each hold a single element of AI. These 64-bit blocks will be called a bitstring. The remaining 4 bits 
will be used later. The number of bitstrings needed will be ^ '^'"'(^^)*^(''"p^-^) j _|_ where dim{M) is the 
dimension of the matrix or its number of elements. 
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The final layout of the first 6 elements of m in the first bitstring can be seen in [31 

bitstringi = 0000 000000101000000000010100000000 1011010001 lllllUlll 1110000100 (3) 

10 1 256 721 1023 900 

Here is a step-by-step description of the application of the SM method to matrix M: 

1. Element Mi_i = 900 = mi^i = 1110000100 is stored in the first 10-bit chunk of the element strip 
bitstring[l] , which corresponds to bits to 9 (read from right to left). 

bitstringi = 000000000000000000000000000000000000000000000000000000 1110000100 

900 

2. Element Mi^2 = 1023 is allocated in the second chunk, from bit 10 to bit 19. 

bitstringi = 00000000000000000000000000000000000000000000 1 1 1 1 Ul 1 1 11 1 10000100 

1023 900 

3. Repeat for elements Mi^ with i ~ 1, . . . , 6 which are stored on the remaining chunks. 

bitstringi = 0000 000000101000000000010100000000101101000111111111111110000100 

^^^^"V^^^^^ V V ^^^^"v^^^^^ 

10 1 256 721 1023 900 

4. Element Mij = 700 = 1010111100 does not fit on the remaining 4 bits of the first bitstring. So 
it will straddle two bitstrings, i.e., it is divided in two segments a and 6, a is written on the first 
bitstring and b on the second. 

b a 

pooooioioo ioioTi I uoo poooooioiopooooooooi . . . 

20 10 1 



bitstring2 bitstringi 

Please notice that bitstrings are written from right to left. 

a b 

pooooioiooToToTi I noo poooooioio . . . 

Thus the compressed matrix m = M2 requires less memory than the conventional storage of M as a 
64-bit integer array. 

Method 2: Variable Length Blocks (VLB) 

In the SM method, there is still waste of space since for elements smaller than the supremum, a number 
of bits remain unused. 

In the VBL method, the absolute minimal number of bits are used to store each value. However, if 
we are going to divide the biststrings into variable length chunks, we also need to reserve some extra bits 
to represent the size of each chunk, otherwise the elements cannot be recovered once they are stored. 

Lets use again the matrix described in Equation [2l where the largest element is number 1023. Now 
instead of assigning one chunk of the bitstring to each element of m, we will assign two chunks: the first 
will store the number of bits required to store the element and the second will store the actual element. 
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The first chunk will have a fixed size, in this case, 4 bits. These 4 bits are the required space to store the 
bit-length of sup Af. in this case, 10. 

Lets go through VLB compression step-by-step. The largest element of M is 1023. Its bit-length is 
10 which in turn is 4 bits long in base 2 (1010). Thus the fixed size chunk is 4 bits long for every element. 

1. The first element A/i^i = 900 requires 10 bits to store, so we write 10 in the first chunk and 900 in 
the second. 

bitstringi = 00000000000000000000000000000000000000000000000000 1 1 10000100 1010 

element=900 bit-length=10 



Mi,i 

2. Do the same for the next element, Afi.2 ~ 1023. 

bitstringi = . . . 00000000000000000000000000 1111111111 1010 1110000100 1010 

clcmcnt=1023 bit-lcngth=10 clcmcnt=900 bit-lcngth=10 



Mi,2 Mi,i 

3. Element Mi ^3 = 721 is also added taking the bitstring to the state. 

bitstringi = 0000000000000000000000 101 1010001 10^10 1 11111111 1 1010 1 1 10000100 10^10 

721 10 1023 10 900 10 

So far the VLB method is more wasteful than the SM, but when we add Mi^4 = 256 we start to save 
some space. 

4. Element Mia = 256 is added. 

5. Elements Mi_5 = 1 and A/i,6 = 10 are added requiring a total of 13 bits instead of 20 with the SM 
method. With the addition of these elements we require a second bitstring. 

bitstringi = 0100^ 1 ^OOO nOOOOOOOO lOO nOllOlOOOl lO^lO Jlllinill 1010 1110000100 10^10 

4 1 1 256 9 721 10 1023 10 900 10 

bitstring2 = 000000000000000000000000000000000000000000000000000000000000 1010 

10 

6. The remaining two elements are added Mij — 700 and A/i.s = 20 in the second bit strip. 

bitstring2 = 00000000000000000000000000000000000101000101 1010111100 10101010 

20 5 700 10 10 

We used a total of 87 bits to store matrix m with the VLB method instead of 80 bits using the SM 
method. However, as shall be seen later, the VLB method will be the most efficient for most matrices. 

Compression Efficiency 

Compression efficiency depends of the data being compressed. Below, a formula for calculating compres- 
sion efficiency is derived for both methods. They will be based on the following ratio: 

bits alocated — bits used 
bits alocated 

Where bits alocated above mean total bits required for standard storage of the matrix, without com- 
pression, while bits used mean total bits requires to store the matrix after compression. From now on 
the efficiencies are denoted by 771 for the SM method and by t]2 for the VLB method. 
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SM Method 

Let Mrxc be the matrix we wish to compress. In comparison with a conventional aUocation (64-bit 
integers), we can apply Equation 2] to caluculate the efficiency of the SM method: 



64 X rc — b(maxM) x rc 
64 X rc 
_ 64 - b{maxM) 
^ 64 



(5) 



As we see in [51 rji does not depend on size of the matrix, only on the bit-length of max AI. If 
6(maxA/) — 64, ryi is 0, i.e., no compression is possible. On the other extreme, if the matrix is composed 
exclusively of Os and Is, maximal compression is achievable, rji = 1. 



VLB Method 

For the VLB method, compression depends on the value of each element of the matrix. In this method 
bit-lentgh variability affects the compression ratio, so the formula will have to include this information. 

Let the rc elements of the matrix Mrxc be divided into g groups, each with fi numbers of bit-length 
bi = b{mi). Thus fi is the frequency of each bit-length present in M. Let k — 6(6(maxM)), i.e., the bit 
length of the bit-length of maxM. The efficiency 772 is shown below. 

64 xrc-ELi (6, + fc) X/, 
We can further simplify Equation |6] to get at shorter expression for the compression ratio. 



Knowing that 



64 X rc - J2^=lib^ X f, + kx fi) 
64 X rc 

^ 64 X rc - Eti h^fi- Eti X /i 
64 X rc 

^ 64 X rc - hxf^^kx Ef^, 

64 X rc 



i=l 

we can simplify the equation above, obtaining ([7]). 



V2 



64 X rc - X^Li bi X fi - k X 

64 X rc 
^ ELi X f, k 
64 X rc 64 



(7) 



Results 

Random Matrix Generation. In both methods, compression efficiency depends on the distribution 
of the bit- lengths b{mi j). Thus, in this section, a method to generate a variety of random bit-length 
distributions is proposed. 
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For simplicity we will model the distribution of as a mixture X of two Beta distributions, Bi ~ 
Beta{ai^ /3i) and B2 ^ Beta{a2, whose probability function is shown in Equation[51 Since the Beta 
distribution is defined only in the interval [0, 1] C M , we applied a simple transformation ([64 x + 1) 
to the mixture in order to map it to the interval of [1,64] C Z. 

F{x) =wBeta{ai,(3i) + (1 - w) Beta{a2, ^2) (8) 

The intention of using this mixture was to find a simple way to represent a large variety of bit length 
distributions. The first two central moments of this mixture are given in [S] and will be used later to 
summarize our numerical results. 



E{X) = wE{Bi) + (1 - w)E{B2) 
+ (1 - w)- 



O-i + Pi Q!2 + h 

Var{X) = wVar{Bi) + (1 - w)Var{B2) + w{l - w){E{Bif - E{B2f) (9) 

In order to explore the compression eflticiency of both methods, we generated samples from the mixture 
defined above, varying its parameters. From now on, when we mention Beta distribution we will mean 
the transformed version defined above. 

From now on we will apply Equations [5] and [7l to determine the compression efficiency of SM and 
VLB methods for random matrices generated as describe above. 

With ui = 0, a single Beta distribution is used. In Figure [TJ we show some distributions of bit-lengths 
for some combinations of ai and From the figure it can be seen that a large variety of unimodal 
distributions can be generated in the interval [1,64]. 

As we are sampling from a large set distributions of bit-length, represented by the mixture of betas 
presented above, in order to make our results more general, we will base our analysis on the expected 
bit-length of a sample, since the efficiency of both methods depends on it. So, from Equations [5] and [71 
the expected efficiencies become: 

^^(^i)-l-^ (10) 

, s E{b) k , , 

E{r]2) = 1 — (11 

64 64 ^ ' 

where k, in PT|) . is set to 7 (the bit- length required to represent the largest possible bit-length: 64). In 
(fn?)) . k is the bit-length of the greatest element, or in the worst case, 64. 

We will use the difference D = E{t]\) — E{t]2) to compare the efficiency of the two methods. Thus a 
positive D will favor SM method while a negative D favors VLB method. 

The expected compression efficiency in the following numeric experiments, will be calculated from 3 
matrices of dimension 10000, generated as described, and presented in tables and figures below. 

In Figure [21 we can see the distribution of efficiencies and their difference for a sample generated 
from a single Beta distribution of bit-lengths. Note that both methods can achieve efficiencies greater 
than 80% for matrices with very small numbers. Also note that the VLB method is more efficient in the 
majority of cases. 

Now let w = 0.5, i.e., matrices will have elements with bit-lengths comming from a mixture of beta 
distributions, B\ ^ Beta{ai, Pi) and B2 ^ Beta(a2, P2)- The expected value for this mixture is shown 
in Equation [T^ 



E{B) = Q.^E{Bi) + 0.5E{B2) (12) 
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(a) a = = 1 



(b) a = 1,13 = 32 



(c) a = 32,/3 = 1 



(d) o = 64, /3 = 64 



Figure 1. Histograms constructed from samples with 10,000 elements, generated from a Beta 
distribution. Below each histogram is possible to verify the parameters used. 



For bit- lengths coming from a mixture {w > 0), let the expected efficiencies for the SM and VLB 
methods be as given by Equations [13] and [141 So now, instead of having the efficiency be a function of 
greatest bit-length in the sample (denoted as k in [5] and [7]) , it will be a function of ■max{E{Bi), E{B2)}- 

64 

Ei,,) . 1 - O.sSil - 0.5^ - m.a.iEiB,lEiB,)} 

64 64 64 ^ ' 

As before, we generate 3 matrices of dimension 10, 000 for each parameterization, calculate the average 
efficiencies (Equations [T^ and ITi)) and their diference D. 

Before moving on to efficiency results and analyses, let's first inspect samples from the mixture of 
transformed Beta distributions. Figures [3] and |4l show a few parameterizations and their resulting sample 
distributions. It is important to note that from the mixture we can now generate bimodal distributions as 
well as the unimodal types tested before. Since we are making statements about efficiency as a function 
of the expected bit-length, it is important to verify if these statements hold for bimodal distributions as 
well. 

After sampling uniformly ([1, 5, 9, ... , 64], n = 65, 536) the bit-length space and comparing efficiencies, 
we summarized the results on Table |3l In it we see how many parameterizations (from our sample) favor 
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(c) VBL Efficiency (r;2) (d) SM>VLB Efficiency (rji > 772) 



Figure 2. Comparing compression efficiency of methods 1 and 2. Color scale in (a), (b) and (c) 
represent average bit-length. In (a) we can see the difference D = r]i — r]2. It can be seen that for most 
combination of a and /S, D < 0, meaning the second method is more efficent to compress a sample of 
numbers with bit-lengths coming from a Beta{a, /3) distribution. However, there is a small region in 
parameter space, which is shown in white on (d), where the SM method is more efficient. This region 
corresponds to the dots in red in (a), where the average bit- length is higher. In panels (b) and (c), we 
can sec the efficiencies of SM and VLB methods, respectively. 



each method. Wc can also look at the distribution of efficiencies on our samples for each method (Figure 
[5]), which clearly demonstrate the greater expected efficiency of method VLB (Figure ["5(b)| . 

As wc have shown, the VLB method is more effective compressing most integer datasets up to 64 
bits in size. This is due to its ability to exploit the variance in the data set and reduce the waste of 
bits in the representation of some numbers. In specific cases where the variance in the data null or too 
small, method I will be more efficient. As a matter of fact, for matrices where all elements have the same 
bit-length, SM method will always be better, regardless of bit-length (Figures [6(a)] and [6(b)] ), The only 
exception if for bit-length 64 where neither method is able to compress the data. 



Discussion 
Calculating EfRciencies 

To determine the best compression method to apply, it's necessary to inspect the distribution of bit- 
lengths of matrix elements. When matrix elements are small or have nearly-constant bit-length, the SM 
Method is best, otherwise, the VLB method should be chosen. 

As an example, let Af^xc be a integer matrix such that the half of its elements have bit-length 1 and 
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1 a 16 24 32 4Q 48 56 64 

Bit number 

(a) ai = l,/3i = 1,Q2 = 1,^2 = 1 




(c) Qi = 32, 13i = 32, 02 = 32, = 32 



IB 16 24 32 40 4B 56 64 

Bit number 

(b) ai = 1, ft = 32, 02 = 32, ft = 1 




1 8 16 24 32 40 48 56 

Bit number 

(d) ai = 64, 13i = 32, ^2 = 32, ft = 64 



Figure 3. Histograms constructed from samples witli 10,000 elements, generated from the mixture of 
two Beta distributions with w ~ 0.5. Below each histogram are the parameters of the mixture. 



the other half 64. Recalling Equation [71 now we have two groups of elements (by bit-length), 6i = 1, 
^2 = 64 and fi~^ * — 1 f-nd 2. As the greatest bit-length is 64, then k = 7. Compression efficiency 
T]2 can be calculated using Equation[7l After plugging in our numbers, we obtain a compression of 38.29%. 



64 X rc 64 

1 X 

?72 = 1 



1 X f 64 X f 7 



?72 = 1 - 



64 X rc 64 
32.5 xrc 7 



64 X rc 64 
r?2 w 1 - 0.5078- 0.1093 



772 ~ 38.29% 
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III ll 



24 32 40 



56 64 



16 24 32 40 43 56 64 



(a) ai = 64, /3i = 48, a2 = 1,132= 48 



(b) ai = 16, /3i = 46, a2 = 49, /32 = 64 



32 40 



56 64 



24 32 40 43 



(c) Qi = 16, ^1 = 16, 02 = 16, h = 49 



(d) ai = 1, /3i = 16, 02 = 1, ft = 49 



Figure 4. Histograms constructed from samples with 10,000 elements, generated from the mixture of 
two Beta distributions with w = 0.5. Below each histogram are the parameters of the mixture. 



The efficiency of the VLB method is influenced by the relative size of the bit-length groups. In this 
first example we considered only two groups, each comprised of half the matrix elements. Let's now vary 
the relative frequency of the groups, while sticking to two groups. Let's also assume that 7^ is a good 
approximation to the probability of a given bit- length in a matrix, which we will denote by pi. 

With this definition we can rewrite the Equation [71 which becomes [13 In Equation [TSl the ^ is 
replaced by pi, representing the probability of elements from group i in matrix M. 

with 

P^ = - (16) 

rc 

With the Equation [15] can analyze the influence of bit- length probability in compression efficiency. In 
this example, pi and p2 represent the probability of elements of bit-lengths 1 and 64, respectively. Thus, 
efficiency is defined in Equation 1171 



12 



Table 2. Efficiency comparison of SM and VLB methods for parameters covering uniformly the 
support of B. Column n shows the number of parameter combinations with which each method has 
superior compression. 



Methods 


n 


Percentage 


SM 


592 


0.9034% 


VLB 


64944 


99.0966% 


Total 


65536 


100% 



(a) Efficiency histogram of the SM method 



(b) Efficiency histogram of the VLB method 



Figure 5. Efficiency histograms of the SM and VLB methods. Note that the VLB method has a 
greater average efficiency than SM method . 



Ixpi+64xp2 7 

^^ = ^ 64 64 

Now, we can determine which probabilities give us the best and worst compression levels. When 
7^2 = 1, then the efficiency is maximal and if 772 = 0, a efficiency is minimal. To calculate the values of 
Pi and p2 for both extreme values of 772, we must solve the linear systems shown in Equations 1181 and 1191 
The first equation on both systems come from the law of total probability. The second comes from [T7l 
after setting 7/2 to 1 and 0, respectively. 

+ 64]32 = 7 

Solving the system above, we find that when p\ = 0.9047 and p2 ~ 0.0953, efficiency is maximal, and 
in this particular case is equal to 87.5%. 

[pi + 64p2 = 57 

Thus, when pi = 0.1111 and p2 = 0.8889 the efficiency is minimal for the VLB method. For other 
combinations see Table [3] Looking at this table, one can see two negative efficiencies, when {pi,p2) 
assume the values (0,1) and (0.1,0.9). This correspond th cases when the method increases the memory 
requirements instead of decreasing it. 

So far, we have examined only two groups (hence two probabilities) of bit-length for the sake of 
simplicity. Before we generalize to probability distributions let's take a quick look at the efficiencies for 
more groups, with uniform probability: 
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(a) EfBciencies of the SM and VLB methods, for con- (b) -/yi — r]2 for matrices of constant bit-length, 
stant bit-length matrices. 



Figure 6. Compression efficiency of the SM and VLB metliods for matrices of constant bit- length. 
Table 3. Combinations pi and p2 to calculate the efficiency. 



Pi 


P2 


V2 


0.0 


1.0 


-0.109 


0.1 


0.9 


-0.010 


0.2 


0.8 


0.087 


0.3 


0.7 


0.185 


0.4 


0.6 


0.284 


0.5 


0.5 


0.382 


0.6 


0.4 


0.481 


0.7 


0.3 


0.579 


0.8 


0.2 


0.678 


0.9 


0.1 


0.776 


1.0 


0.0 


0.875 



• 3 groups with bit-lengths 1, 32 and 64 bits, efficiency r]2 = 0.3854, 

• 5 groups with bit-lengths 1, 16, 32, 48 and 64 bits, efficiency 772 = 0.3875, 

• 8 groups with bit-lengths 1, 8, 16, 24, 32, 40, 48, 56 and 64 bits, efficiency 772 = 0.3888 

When the distribution of the group probabilities is uniform, i.e., the groups have approximately the 
same size, efficiency is basically the same, regardless of the number of groups. 

Now we can leverage the notion of bit- length probabilities, and study efficiency when bit- lengths follow 
some commonly used discrete probability distributions: Discrete Uniform, Binomial and Poisson. For all 
the experiments, we assume fc = 7, that is, the maximum possible bit-length is 64 bits. Thus, efficiency 
obtained will not be the best possible, since for that we would need assume small values of k (Equation 

[13. 

Discrete Uniform 

Let the bit- lengths of the matrices be distribute according to the Uniform distribution U{a = l,b = 64), 
which means bit-lengths may take values in the set {1, 2, 3,. . . , 64} with equal probability, i.e., 
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Theoretical Efficiency: Let the random variable B^U{a = l,b = 64) represent the bit-length of 
the elements of matrix A/. Then E{bi) ~ '^ih x p{bi) = ^i^. Applying this result to the expected 
compression efficiency of VLB method (Eauation ll5|) . we have 

, , E(B) k , , 

(20) 

assuming all bit-lengths are possible, i.e., a = 1 and 6 = 64, and hence k = 7, we can calculate r]2'. 

1 + 64 y 

^M = l--^-^« 38.28% (21) 
This result agrees with the numerical estimates presented in Table 21 



Numerical Estimates: To calculate the VLB efficiency, we generated a matrices with 100 (Mioxio): 
10,000 (Miooxioo) and 1,000,000 (Mi,oooxi,ooo) elements with 1, 8, 16, 32 and 64 number of bitss. The 
average efficiency (Tabled]) is calculated from a 1,000 replicates of each matrix size. As expected the 
compression effiency gets better with lower expected bit-length. 

Table 4. Compression efficiency of VLB method of samples with bit-lengths coming from a Discrete 
Uniform distribution U{a = l,b ^ 64). Average efficiency ± SD) were calculated over a 1,000 
replicates. 



Matrix sizes 






Sample size 




Expected bit-length 


100 


10,000 


1,000,000 


1 


0.8750±0.0000 


0.8750±0.0000 


0.8750±0.0000 


8 


0.8202±0.0031 


0.8203±0.0003 


0.8203±0.0000 


16 


0.7580±0.0066 


0.7578±0.0007 


0.7578±0.0001 


32 


0.6330±0.0142 


0.6329±0.0014 


0.6328±0.0001 


64 


0.3826±0.0288 


0.3828±0.0028 


0.3828±0.0003 



Binomial Distribution 

For the binomial distribution, we will use Bin(n,p), with the number of trials n representing the greatest 
possible bit-length in the matrix, and np giving us the expected bit-length. 

Theoretical Efficiency: Let bit-length (B) be a random variable with Binomial distribution, B ^ 
B[n = 64,p = 0.5), E{bi) ~ bi x p{bi) — n x p and the eficiency becomes (with k — 7): 

N . 64 X n k 
Eiv.) - 1 - V - 64 
64 X 0.5 7 
^l-^^-64"^^-«^^" 
Which again agrees with estimates in Table [5l 



Numerical Estimates: For these experiments, the parameter n represents the maximum bit-length 
of matrix elements and takes values in {1,8,16,32,64}. In this case, we evaluate the efficiency as a 
function of the parameter n, and matrix size. Even though efficiency does not depend on matrix size, we 
tried different sizes to test the stability of the compression algorithm. Results are shown in Table [5] As 
expected, smaller bit-lengths lead to higher compression efficiencies. 
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Table 5. Compression efSciency with bit- lengths distributed according to a binomial distribution 
B{n, 0.5). Parameter n S {1, 8, 16, 32, 64} represents the maximum bit-length. Since p = 0.5 the 
expected bit-length is n/2 (first column) 



Efficiency 






Sample size 




Expected bit-length (np) 


100 


1,000 


1,000,000 


1 


0.8828±0.0004 


0.8828±0.0004 


0.8828±0.0000 


8 


0.8283±0.0022 


0.8281±0.0002 


0.8281±0.0000 


16 


0.7656±0.0032 


0.7656±0.0003 


0.7656±0.0000 


32 


0.6406±0.0045 


0.6406±0.0004 


0.6406±0.0000 


64 


0.3910±0.0065 


0.3906±0.0006 


0.3906±0.0001 



Poisson Distribution 

With bit-length derived from a Poisson(A), the parameter A corresponds to the expected bit- length. For 
the purpose of this analysis this Poisson distribution is truncated at 64. 

Theoretical Efficiency: Let bit- length B ^ Poisson{X — 32). In this case, E{bi) — A, with k — 7, 
the efficiency becomes: 

E{V2) = 1 
= 1 

This result is in accordance to Table IHl 

Numerical Estimates: The results for this simulation can be seen in [51 Note that increasing the 
number of bits to represent numbers increases, there is a loss of efficiency in the compression process. 
In this case we did not simulate for A = 64 since a large portion of the samples would fall above the 
maximum bit-length we are considering for this analysis. 

Table 6. Compression efficiency with bit- lengths distributed according to a Poisson distribution (A), 
where A represents the expected bit-length. 



Efficiency 7^2 






Matrix Size 




Expected bit-length (A) 


100 


1,000 


1,000,000 


1 


0.8751±0.0015 


0.8750±0.0004 


0.8750±0.0000 


8 


0.7654±0.0046 


0.7656±0.0005 


0.7656±0.0000 


16 


0.6405±0.0064 


0.6406±0.0006 


0.6406±0.0001 


32 


0.3908±0.0087 


0.3906±0.0009 


0.3906±0.0001 



These results show that a good compression is guaranteed when bit-lengths are distributed according 
to the tested distributions regardless of sample size. 



A k 

64 ^ 64 
32 7 

64 ^ 64 



39.06% 



(23) 
(24) 
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1 Conclusion 

In this paper, we have focused in the compression of matrix data, since this is one of the most important 
apphcation the authors foresee. However, the compression methodology presented can be apphed to any 
numerical data structure, with gains to performance and memory footprint [citar tese Crysttian e possveis 
artigos derivados]. 

Further discussions about doing computation with such compressed data-structures will be the subject 
of another manuscript (in preparation) in which we will present details about the implementation of the 
compression algorithm, and benchmarks on classical linear algebra tasks such as those in Linpackfref]. 

For the compression calculations presented in this paper we limited bit-lenght of integers to 64 bits. 
However the compression would work in the same way as discussed for computer architectures with larger 
word sizes. 

Representation of floating point numbers is also possible within the proposed compression framework, 
but at the expense of precision in their representation. Although this may sound like a limitation, when 
we take into consideration that most experimental data have fewer "significant" digits than the maximal 
precision available in modern computers, fairly good compression may still be achievable for floats. 
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